Toggle between layers to view COVID 19 rates from NYC Health and observed mask usage by the New York Times.

Click on an area for more info.

NYC Health (Dates)

Using data from NYC Health, these layers map out city-wide COVID 19 Rates per 100,000 People in each ZCTA. This map includes weekly rates from July 13 to August 10, 2020. Mouseover of ZCTA shows area name, borough, ZCTA, and case rate as designated by NYC Health.

New York Times Mask Observations (NYT Obs)

Using data from the New York Times article “Are New Yorkers Wearing Masks?”, this layer maps out observed mask usage rates by the Times’ reporters between July 27 to July 30, 2020. The additional NYT Obs layers shows observed mask usage rates based on perceived gender. The ZCTAs where the intersections of the Times reporters were used to map out observed mask usage rates and to compare NYC Health data with. Mouseover of the ZCTA shows area name, borough, and intersection of observation as reported by the Times. ZCTAs were found by me.

Link to article: https://www.nytimes.com/2020/08/20/nyregion/nyc-face-masks.html


Overview

Methodology

This report aims to compare the observed mask usage rates in select areas in NYC to the COVID 19 data taken around the same time. The data is shown with visuals such as maps and charts for comparison across areas in NYC. Each area in this analysis is divided by ZCTA which is used by NYC Health and can be inferred from the NY Times article. The dates chosen are due to the estimated 2 week time period for COVID 19-related symptoms show.

[FINISH SECTION LATER]

Motive

I wanted to see if the observed mask usage rates from the Times would correlate with the the spread of COVID 19 in NYC, and also wanted an opportunity to use data visualisations to compare related data sets.

Sources

NYC Health

https://www1.nyc.gov/site/doh/covid/covid-19-data.page

https://github.com/nychealth/coronavirus-data

NYC Health publishes their official COVID 19 data their website and Github. I used the data-by-modzcta.csv data sets from the commits on the following dates: 07-13, 07-20, 07-27, 08-03, 08-10. Within these data sets, I used the COVID_CASE_COUNT and PERCENT_POSITIVE variables to determine the spread of COVID 19. Here’s an overview from NYC Health:

COVID_CASE_RATE - Rate of confirmed cases per 100,000 people by modified ZCTA PERCENT_POSITIVE - Percentage of people ever tested for COVID-19 with a polymerase chain reaction (PCR) test who tested positive

I chose COVID_CASE_RATE as it was an easy to understand metric that adjusted for population within each ZCTA.

A major limitation of this dataset, as well as most relating to COVID 19 is that this only includes detected and confirmed cases of the disease. There is the possibility that there are cases of the virus that contributes to spread which are not included in this data set.

The New York Times - “Are New Yorkers Wearing Masks?”

https://www.nytimes.com/2020/08/20/nyregion/nyc-face-masks.html

The article from the New York Times shares their observed mask usage in 14 locations throughout New York City. These observations were taken between July 27-30 in the daytime between 09:00 and 19:00. According to the Times, 340 to 567 were observed in each location and only counted pedestrians (people in cars, skateboards, bikes were excluded). The data is presented with the intersection observations were made, an overall percentage of mask usage, and percentages of mask usage for men and women. Gender was determined by someone’s ‘apparent gender’. I manually compiled the Times’ observations across the city and used the ZCTAs that the intersections mentioned were located.

A major limitation of this data set is that it only provides one observation per location so trends over time cannot be inferred. Observations were made only on specific intersections and are not necessarily representative of the entire ZCTA. The Times stated that their selected observation spots were chosen due to their expected population density.

‘Are New Yorkers Wearing Masks’ Data

Visual

## No trace type specified:
##   Based on info supplied, a 'bar' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#bar

This chart depicts the observed mask usage rates from the Times. The data is a percentage of the observed population in these areas that were seen using masks. Some notable highlights include the very high mask usage in Flushing, an area with strong ties to mainland China, and in areas like Corona that were heavily impacted by the spread of COVID 19. It should be noted that Rockaway Beach is an outlier as it is a beach, while the other locations were busy intersections.

NYC Health - Case Rate per 100,000 people

Visual

## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter

### Background

This chart visualises COVID 19 case rates per 100,000 people in the same ZCTAs that the Times reported on. As mentioned earlier, data is taken from 2 weeks before and after the Times reporting on July 27-30 and samples data from the following dates: July 13, July 20, July 27, August 3, and August 10. The ZCTA with the highest case rate within this set is Corona (11368), which borders the ZCTA with the highest case rate in NYC (East Elmhurst, 11369). Another observation is that the two locations reported on in Manhattan (Harlem, East Village) have relatively lower case rates than the rest of the set.

Analysis

nyc.avg.caserate <- nyc.covid19.mask %>% 
  select("zip", "date","area", "COVID_CASE_RATE") %>% 
  dplyr::group_by(zip) %>% 
  dplyr::arrange(zip, date) %>% 
  mutate(rate = (COVID_CASE_RATE - lag(COVID_CASE_RATE))/lag(COVID_CASE_RATE)) %>% 
  summarise(avg_rate = mean(rate, na.rm = TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
nyc.avg.rate.mask <- left_join(nyt.data, nyc.avg.caserate, by ="zip") %>% 
  mutate(obs_mask = 1 - obs_mask)
plot_ly(nyc.avg.rate.mask,
        x = ~area,
        y = ~obs_mask,
        type = 'scatter',
        mode = 'markers',
        name = 'Mask Rate',
        visible = T) %>% 
  add_trace(nyc.avg.rate.mask, y = ~avg_rate, name = 'Avg. Change Rate', visible = T) %>% 
  layout(
    title = 'Observed Mask Rates & COVID 19 Positive Rates by Area',
    showlegend = TRUE,
    yaxis = list(title = "% of Masks Observed/COVID 19 Positive Rate",tickformat = "%"),
    xaxis = list(title = "Area"),
    hovermode = 'compare'
)
avg_rate.mask_model <- lm(formula = avg_rate ~ obs_mask, data = nyc.avg.rate.mask)

summary(avg_rate.mask_model)
## 
## Call:
## lm(formula = avg_rate ~ obs_mask, data = nyc.avg.rate.mask)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0034172 -0.0009060 -0.0000578  0.0005096  0.0037204 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0121353  0.0009391  12.922 2.11e-08 ***
## obs_mask    -0.0094030  0.0030206  -3.113  0.00897 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.002089 on 12 degrees of freedom
## Multiple R-squared:  0.4468, Adjusted R-squared:  0.4007 
## F-statistic:  9.69 on 1 and 12 DF,  p-value: 0.008972